i SYSTEM AND METHOD FOR AGENT REPORTING IN TO SERVER 

2 

3 TECHNICAL FIELD OF THE INVENTION 

4 This invention relates to computer system administration and management, and, in particular, 

5 to determining the status of multi-server management agents. 

6 

7 BACKGROUND OF THE INVENTION 

8 Admiinstration of large, multi-server, computing environments is a field of growing interest as 

9 the number and size of large, multi-server computing environments grows . The field of multi-server 

I o system adrrnnistration and management focuses on mamtarning the physical operation of a multitude of 

I I computer systems, often referred to as nodes, connected in a network. These management tasks 
1 2 include a number of functions, including adding, modifying and removing nodes, users, tools, and roles; 
B defining groups of nodes; authorizing users to perform operations on nodes; installing, mamtaining and 

1 4 configuring hardware; installing and upgrading operating system and application software; and applying 

15 software patches, among other functions. 

1 6 Several powerful software applications that assist and centralize the management of large, multi- 

1 7 server, computing environments have been developed in the field. Generally these applications have 

1 8 included a single, large multi-server management application running on a single centrally located 

1 9 management server operated by one or more system administrators, and, in only a few implementations, 

20 separate management agent applications running on each of the nodes in the multi-server computing 

21 environment. 

22 In such a configuration, the large, central multi-server management application running on a 

23 centrally lo cated management server is generally responsible for communicating with the separate 

24 management agent applications running on each of the nodes in order to determine the status of any 

25 management tasks being performed on each of the nodes. The central multi-server management 

26 application is thus required to constantly query the separate management agent applications on each 

27 of the nodes. This results in growing demand on network bandwidth as the central multi-server 

28 management application must query more and more nodes. 
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Another result of this arrangement is increasing wait times as the central multi-server 
management application must wait for responses from each of the nodes before proceeding with other 
tasks. In addition, the failure of any management agent, or a sudden failure of a node on which a 
management agent is performing a task, may cause the central multi-server management application to 
become caught in an indefinite loop waiting for a response from an inactive agent. Furthermore, the 
central multi-server management application may also be interrupted by the routine removal of a node 
from service in order to perform a hardware or operating system software upgrade and may not be 
made aware of the occurrence or nature of the upgrade upon the return of the node to service. 

SUMMARY OF THE INVENTION 

In one respect, what is described is a system for managing a multiple server computer system 
on a computer network. The system includes a central management server and one or more remote 
nodes connected to the central management server. The central management server further comprises 
a processor for executing programs, a main memory for storing currently executing program code, and 
a secondary storage device for storing program code and data. Each remote node further comprises 
aprocessor for executing programs, a main memory for storing currently executing program code, and 
a secondary storage device for storing program code and data. The system also includes a distributed 
task facility that assigns and monitors system management tasks on the remote nodes, running on the 
processor in the central management server, and an agent, running on the processor in each remote 
node, that executes system management tasks and initiates contact with the central management server 
to report the properties of the remote node on which it is running. 

In another respect, what is described is a method for managing a multiple server computer 
system on a computer network, wherein an agent running on a node initiates contact with a central 
management server to report the properties of the remote node to the central management server. The 
method includes steps for executing an agent on a remote node and creating a properties object 
containing information relating to certain properties of the remote node on which the agent is executing. 
The method also includes steps for the agent initiating contact with a central management server, and 
the agent passing the properties object from the agent to the central management server, whereby the 



1 agent reports the properties of the remote node on which it is executing to the central management 

2 server. 

3 In yet another respect, what is described is a computer readable medium on which is embedded 

4 a program. The embedded program includes instructions for executing the above method. 

5 Those skilled in the art will appreciate these and other advantages and benefits of various 

6 embodiments of the invention upon reading the following detailed description of a preferred 

7 embodiment with reference to the below-listed drawings. 

8 

9 BRIEF DESCRIPTION OF DRAWINGS 

i o Figure 1 is a block diagram of a computer system on which the present invention may be run. 

l i Figure 2 is a diagram of one embodiment of a system according to the present invention. 

12 Figure 3 is a flowchart of one embodiment of a method according to the invention. 

13 

14 DETAILED DESCRIPTION OF THE INVENTION 

1 5 Figure 1 shows a network system 1 0 on which the present invention may be run. The network 

16 system 10 comprises a ServiceControl Manager ("SCM") 12 rarrning ona Central Management Server 

17 ("CMS")14andoneormorenodesl6managedbytheSCM12ontheCMS14. Together the one 
is ormorenodes 16 managed by the SCM 12 make up an SCM cluster 17. A group of nodes 16may 

19 be organized as a node group 18. Anode 16 preferably comprises a server or other computer system. 

20 The CMS 1 4 preferably is an HP-UX 1 1 .x server running the SCM 1 2 software. The CMS 

21 14 includes a memory (not shown), a secondary storage device 1 4 1 , a processor 1 42, an input device 

22 (not shown), a display device (not shown), and an output device (not shown). The memory, a 

23 computer readable medium, may include, RAM or similar types of memory, and it may store one or 

24 more applications for execution by processor 1 42, including the SCM 1 2 software. The secondary 

25 storage device 1 4 1 , a computer readable medium, may include a hard disk drive, floppy disk drive, 

26 CD-ROM drive, or other types of non- volatile data storage. The processor 1 42 executes the SCM 

27 12 software and other application(s) , which are stored in memory or secondary storage, or received 

28 from the Internet or other network 24. An exemplary SCM 12 is programmed in the Java 
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programming language and operates in a Java environment. For a description of an exemplary SCM 
12, see ServiceControl Manager Technical Reference. HP part numb er: B8339-90019. which is 
incorporated herein by reference and which is accessible at 
http://www. software.hp.com/products/ scmgr . 

Generally, the SCM 1 2 supports managing a single SCM cluster 1 7 from a single CMS 1 4. 
All tasks performed on the SCM cluster 17 are initiated on the CMS 14 either directly or remotely, for 
example, by reaching the CMS 1 4 via a web connection 20. Therefore, a workstation 22 at which a 
user interacts with the system only needs a web connection 20 over a network 24 to the CMS 1 4 in 
order to perform tasks on the SCM cluster 1 7. The workstation 22 preferably comprises a display, 
a memory, a processor, a secondary storage, an input device and an output device. In addition to the 
SCM 12 software and the HP-UX server described above, the CMS 14 may also include a data 
repository 26 for the SCM cluster 17, a web server 28 that allows web access to the SCM 12, a depot 
30 comprising products used in the configuring of nodes, and an I/UX server 32. Java objects 
operating in a Java Virtual Machine ("JVM") can provide the functionality of this exemplary SCM 1 2. 

Object-oriented programming is a method of programming that pairs programming tasks and 
data into re-usable chunks known as objects. Each object comprises attributes (/. e. , data) that define 
and describe the obj ect. Java classes are meta-defmitions that define the structure of a Java object. 
Java classes when instantiated create instances of the Java classes and are then considered Java 
objects. Methods within Java objects are called to get or set attributes of the Java object and to 
change the state of the Java obj ect. Associated with each method is code that is executed when the 
method is invoked. In addition to the Java programming language, objects and object classes can be 
implemented with other programming languages. 

Figure 2 is a diagram of one embodiment of a system 200 according to the present invention. 
The primary components of the system 200 are an SCM 1 2 running on the processor 1 42 of a CMS 
14 and a ServiceControl Manager Agent ("SCM Agent") 220 running on a remote node 16. The 
remote node 1 6 is preferably a server which includes a main memory 227, a secondary storage 228, 
aprocessor 225, an input device (not shown), a display device (not shown), and an output device (not 
shown). 



The SCM 12 preferably runs under the control of a server operating system 230, whichmay 
be a version of the UNIX operating system, such as Hewlett-Packard' s HP-UX operating system, or 
any other version of the UNIX operating system, or other server operating system. In the system 200, 
the SCM 1 2 comprises several modules performing discrete multi-system management tasks, including 
a distributed task facility 240, a node manager 250, and a log manager 255. 

The distributed task facility 240 is a module of the SCM 1 2 responsible for remote execution 
of tools and tasks on the remote nodes 1 6 and for communicating with the SCM Agents 220 on the 
remote nodes 1 6. The node manager 250 is amodule of the SCM 12 responsible for managing node 
objects. The log manager 255 is amodule of the SCM 1 2 responsible for logging the results and status 
of tasks and operations performed by the various other components of the SCM 12. 

The S CM Agent 220 runs on a processor 225 of the remote node 1 6 under the control of a 
server operating system 23 5, such as those identified above, or other server operating system. The 
SCM Agent 220 comprises several modules including a reporting module 260, a task module 270 and 
aproperties module 280. The reporting module 260, taskmodule 270 and properties module 280, 
may preferably be implemented as Java classes. As previously noted, Java classes are meta-definitions 
that define the structure of a Java object. 

The task module 270 is responsible for accepting and executing system management tasks 
assigned to the SCM Agent 220 by the SCM 12. The properties module 280 is responsible for 
determining the properties of the remote node 16 on which the SCM Agent 220 is running. The 
reporting module 260 is responsible for reporting results obtained from the properties module 280, 
including the status of the SCM Agent 220, to the SCM 12. The SCM Agent 220, through the 
reporting module 260 , initiates contact with and reports in to the distributed task facility 240 on the 
CMS 14, rather than idling until it is queried by the CMS 14. 

When the SCM Agent 220 is started up on the remote node 1 6, the properties module 280 
of the SCM Agent 220 determines selected properties of the node 1 6 on which it is running, including, 
for example, the hardware configuration of the node 1 6, the network name and address of the node 
1 6, the type and version number of the server operating system 23 5 under which the S CM Agent 220 
is running, and the version number and status of the SCM Agent 220. Any operating characteristic of 



1 the node 1 6, hardware, software or otherwise, may be considered a property that can be determined 

2 and reported by the SCM Agent 220. 

3 These and other properties determined by the user are then recorded and stored in aproperties 

4 file, preferably on the secondary storage 228, by the SCM Agent 220 and reported by the reporting 

5 module 260 to the distributed task facility 240. The distributed task facility 240 writes the properties 

6 of the remote node 16 reported by the SCM Agent 220 to a file or other storage device that is 

7 electronically accessible via the network system 1 0 to all other modules of the SCM 1 2, including the 

8 node manager 250. The SCM 1 2 can then determine if there are any tasks that had previously been 

9 assignedtotheSCMAgent220forwhichithasnotyetreceivedaresponse. FromthistheSCM 12 

10 can determine if the node 1 6 or the SCM Agent 220 have failed and been re-started. Furthermore, 
n from the properties passed to the SCM 12 by the SCM Agent 220, the SCM 12 can determine, 

12 among other things, whether the hardware configuration of the node 1 6 on which the SCM Agent 220 

13 is running has changed or been upgraded, whether the SCM Agent 220 software has been changed 

14 or upgraded, and whether the operating system software 235 running on the node 16 has been 

15 changed, patched or upgraded. 

16 The reporting module 260 preferably reports the properties of the node 1 6 to the distributed 
n task facility 240 by passing a properties obj ect containing property values from the properties file 

18 created by the SCM Agent 220. 

19 Figure3 isaflowchartofoneembodimentofamethod300accordingtothepresentinvention. 
ip When a remote node 1 6 is initially started up, or when it is restarted after a failure or planned outage, 

21 the S CM Agent 220 is started up (step 3 05). In one embodiment of the present invention, the SCM 

22 Agent 220 may be started when the remote node 1 6 is restarted, i.e., rebooted, or by request through 

23 a UNIX init( 1 m) process, or in other ways. In this embodiment of the present invention, the SCM 

24 Agent 220, upon startup, runs the properties module 280, preferably implemented as a UNIX shell 

25 script, to gather data on the properties of the remote node 16, and then instantiates a JVM which 

26 further instantiates an SCM Agent object 220. The SCM Agent object 220 takes over further steps 

27 of the method 300. 



HP No. 10007360-1 



6 



Following startup of the SCM Agent 220, the S CM Agent 220 creates a properties file (step 
3 1 0) on the remote node 1 6, preferably on the secondary storage 228 , containing values associated 
with selected properties ofthe remote node 16. The SCM Agent 220, through the properties module 
280, preferably invokes a shell script to create the properties file. A shell script is used to create the 
properties file so that a user or system administrator can modify the script to have more control over 
what properties of the node 16 will be included. 

The SCM Agent 220 then creates a properties obj ect (step 3 1 5), which may comprise a J ava 
object, containing as attributes the values specified in the properties file created in step 3 1 0. Creating 
a properties obj ect (step 3 1 5) may be accomplished by instantiating aproperties class and populating 
the properties obj ect attributes with the values specified by the properties file, by a constructor call, or 
through other methods of obj ect creation. In a preferred embodiment ofthe present invention, the 
SCM Agent 220 invokes a read-properties method of a properties class to populate the properties 
object with the values from the properties file created upon startup ofthe SCM Agent 220 in step 310. 

The SCM Agent 220 proceeds to initiate contact (step 3 20) with the distributed task facility 
240 on the CMS 1 4 . The SCM Agent 220 may initiate contact with the distributed task facility 240 
by way of invoking a method on the SCM 1 2. In a preferred embodiment of the present invention, the 
SCM Agent 220 initiates contact with the distributed task facility 240 by using a standard Java Remote 
Method Invocation registry mechanism and calling a method on the distributed task facility 240, passing 
the properties object (step 325) as one as one ofthe arguments of the method call. 

In one embodiment, the method 300 may also include a step for authenticating the call from the 
SCM Agent 220 to the distributed task facility 240 using standard Java security mechanisms. This 
authentication may be performed to ensure that the SCM Agent 220 is properly authorized to call the 
distributed task facility 240 and that the distributed task facility 240 being called by the SCM Agent 
220 is the correct distributed task facility 240 associated with the remote node 1 6. Once contact is 
made and authenticated between the SCM Agent 220 and the distributed task facility 240, the SCM 
Agent 220 passes the properties object (step 325) to the distributed task facility 240. 

Upon receiving the properties obj ect from the SCM Agent 220, the distributed task facility 
240 writes (step 330) the contents of the properties objectto a central properties file (in the secondary 



storage 141, for example) on the CMS 14. The central properties file is preferably then available to 
other functions or modules of the SCM 1 2, including the node manager 250. The distributed task 
facility 240 logs (step 335) the transaction of receiving and writing the properties object data to the log 
manager 255 to indicate that an SCM Agent 220 has restarted and reported in. 

After logging the transaction (step 335), the distributed task facility 240 checks to determine 
if there were any outstanding tasks (step 340), assigned to the SCM Agent 220 prior to contact being 
initiated with the distributed task facility 240 by the SCM Agent 220, for which the distributed task 
facility 240 is still awaiting aresponse from the SCM Agent 220. If so, then the distributed task facility 
240 preferably flags such tasks as failed. The tasks are considered failed since the SCM Agent 220 
has restarted since the tasks were assigned to the SCM Agent 220 without the SCM Agent 220 
previously noting the completion of such tasks to the distributed task facility 240. 

The steps of the method 300 can be implemented with hardware or by execution of programs, 
modules or scripts. The programs, modules or scripts can be stored or embodied on one or more 
computer readable mediums in a variety of formats, such as source code, object code or executable 
code, for example. The computer readable mediums may include, for example, both storage devices, 
such as the CMS 14 memory or secondary storage device 141 , and signals. Exemplary computer 
readable storage devices include conventional computer system RAM (random access memory), ROM 
(read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, 
programmable ROM), and magnetic or optical disks or tapes. Exemplary computer readable signals, 
whether modulated using a carrier or not, are signals that a computer system hosting or running the 
described methods can be configured to access, including signals downloaded through the Internet or 
other networks. 

The terms and descriptions used herein are set forth by way of illustration only and are not 
meant as limitations. Those skilled in the art will recognize that many variations are possible within 
the spirit and scope of the invention as defined in the following claims, and their equivalents, in whic 
all terms are to be understood in their broadest possible sense unless otherwise indicated. 
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